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In many semiparametric models that are parameterized by two 
types of parameters — a Euclidean parameter of interest and an infinite- 
dimensional nuisance parameter — the two parameters are bundled to- 
gether, that is, the nuisance parameter is an unknown function that 
contains the parameter of interest as part of its argument. For ex- 
ample, in a linear regression model for censored survival data, the 
unspecified error distribution function involves the regression coef- 
ficients. Motivated by developing an efficient estimating method for 
the regression parameters, we propose a general sieve M-theorem for 
bundled parameters and apply the theorem to deriving the asymp- 
totic theory for the sieve maximum likelihood estimation in the lin- 
ear regression model for censored survival data. The numerical im- 
plementation of the proposed estimating method can be achieved 
through the conventional gradient-based search algorithms such as 
the Newton-Raphson algorithm. We show that the proposed estima- 
tor is consistent and asymptotically normal and achieves the semi- 
parametric efficiency bound. Simulation studies demonstrate that the 
proposed method performs well in practical settings and yields more 
efhcient estimates than existing estimating equation based methods. 
Illustration with a real data example is also provided. 
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1. Introduction. In a semiparametric model that is parameterized by 
two types of parameters — a finite-dimensional Euclidean parameter and an 
infinite-dimensional parameter — oftentimes the infinite-dimensional param- 
eter is considered as a nuisance parameter, and the two parameters are 
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2 Y. DING AND B. NAN 

separated. In many interesting statistical models, however, the parameter 
of interest and the nuisance parameter are bundled together, a terminology 
used by Huang and Wellner [13] when they reviewed the linear models under 
interval censoring, which means that the infinite-dimensional parameter is 
an unknown function of the parameter of interest. For example, in a linear 
regression model for censored survival data, the unspecified error distribu- 
tion function, often treated as a nuisance parameter, is a function of the 
regression coefficients. Other examples include the single index model and 
the Cox regression model with an unspecified link function. 

There is a rich literature of asymptotic distributional theories for M- 
estimation in a variety of semiparametric models with well-separated pa- 
rameters; see, for example, [10-12, 24, 30, 33], among many others. Though 
many methodologies of M-estimation for bundled parameters have been pro- 
posed in the literature, general asymptotic distributional theories for such 
problems are still lacking. The only estimation theories for bundled param- 
eters we are aware of are the sieve generalized method of moment of [1] and 
the estimating equation approach of [5, 19]. 

In this article, we consider an extension of existing asymptotic distribu- 
tional theories to accommodate situations where the estimation criteria are 
parameterized with bundled parameters. The proposed theory has similar 
flavor of Theorem 2 in [5], but they are different because the latter requires 
an existing uniform consistent estimator of the infinite-dimensional nuisance 
parameter with a convergence rate faster than n~^'^, which is then treated 
as a fixed function of the parameter of interest in their estimating procedure, 
while we need to simultaneously estimate both parameters through a sieve 
parameter space; furthermore, their existing nuisance parameter estimator 
needs to satisfy their condition (2.6), which is usually hard to verify when 
its convergence rate is slower than n~^'^. Our proposed theory is general 
enough to cover a wide range of problems for bundled parameters including 
the aforementioned single index model, the Cox model with unknown link 
function and a linear model under different censoring mechanisms. Rigor- 
ous proofs for each of the models, however, will take lengthy derivations. 
We only use the efficient estimation in the semiparametric linear regression 
model with right censored data as an illustrative example that motivates 
such a theoretical development and will present results for other models 
elsewhere. Note that the considered example cannot be directly put into 
the framework of restricted moments due to right censoring, thus cannot be 
handled by the method of [1]. 

Suppose that the failure time transformed by a known monotone trans- 
formation is linearly related to a set of covariates, where the failure time 
is subject to right censoring. Let Tj denote the transformed failure time 
and Ci denote the transformed censoring time by the same transformation 



M- THEOREM FOR BUNDLED PARAMETERS 3 

for subject i, i = 1, . . . ,n. Let YJ = min(Tj,Cj) and Aj = /(Tj < Cj). Then 
the semiparametric hnear model we consider here can be written as 

(1.1) Ti = X'iPo + eo^i, i = l,...,n, 

where the errors eo,i are independent and identically distributed (i.i.d.) with 
an unspecified distribution. When the failure time is log-transformed, this 
model corresponds to the well-known accelerated failure time model [16]. 
Here we assume that {Xi,Ci), i = l, . . . ,n, are i.i.d. and independent of eo,4. 
This is a common assumption for linear models with censored survival data, 
which is particularly needed in [22] to derive the efficient score function 
for /3q. Such an assumption, however, is stronger than necessary in the usual 
linear regression without censoring, for which the error is only required to 
be uncorrelated with covariates; see, for example, [3]. We also avoid trivial 
transformations such as log(O) so that we always have l^'s bounded from 
below. 

The semiparametric linear regression model relates the failure time to 
the covariates directly. It provides a straightforward interpretation of the 
data and serves as an attractive alternative to the Cox model [6] in many 
applications. Several estimators of the regression parameters have been pro- 
posed in the literature since late 1970s, including the rank-based estimators 
(see, e.g., [14, 15, 20, 26, 29, 31]) and the Buckley-James estimator (see, 
e.g., [2, 17, 21]). There are two major challenges in the estimation for such 
a linear model: (1) the estimating functions in the aforementioned meth- 
ods are discrete, leading to potential multiple solutions as well as numerical 
difficulties; (2) none of the aforementioned methods is efficient. Recently, 
Zeng and Lin [32] developed a kernel-smoothed profile likelihood estimating 
procedure for the accelerated failure time model. In this article, we consider 
a sieve maximum likelihood approach for model (1.1) for censored data. 
The proposed approach is much intuitive, easy to implement numerically 
and asymptotically efficient. 

It is easy to see that T and C are independent conditional on X under the 
assumption cq _L {C,X). Hence the joint density function of Z = {Y,A,X) 
can be written as 

(1.2) /y,A,x(y,<5,x) = Ao(2/-x'/3o)^exp{-Ao(?/-x'/3o)}//(y,5,x), 

where Ao(-) is the true cumulative hazard function for the error term cq 
and Ao(-) is its derivative. H{y,6,x) only depends on the conditional dis- 
tribution of C given X and the marginal distribution of X, and is free 
of /3o and Aq. To simplify the notation, we will ignore the factor H from 
the likelihood function. Then for i.i.d. observations (Yi, Aj, Xj), i = 1, . . . , n, 
from (1.2) we obtain the log likelihood function for /3 and A as 

(1.3) /„(/3,A) = n-i^|A,log{A(y, - X,'/3)} - j Iiy^ > t)X{t - Xll3)dt\. 
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The log likelihood given in (1.3) apparently is a semiparametric model, where 
the argument of the nuisance parameter A involves /?; thus /3 and A are 
bundled parameters. To keep the positivity of A, let g{-) = log A(-). Then the 
log likelihood function for /3 and g, using the counting process notation, can 
be written as 



(1.4) lnW,g)=n 



'^JZ{j 9{t- Kf^) dmt) - I HY, > i)e^(*-^»'« dt 



where Ni{t) = Aj/(1^ < t) is the counting process for subject i. 

We propose a new approach by directly maximizing the log likelihood 
function in a sieve space in which function g(-) is approximated by B-splines. 
Numerically, the estimator can be easily obtained by the Newton-Raphson 
algorithm or any gradient-based search algorithms. We show that the pro- 
posed estimator is consistent and asymptotically normal, and the limiting 
covariance matrix reaches the semiparametric efficiency bound, which can be 
estimated either by inverting the information matrix based on the efficient 
score function of the regression parameters derived by [22], or by inverting 
the observed information matrix of all parameters, taking into account that 
we are also estimating the nuisance parameters in the sieve space for the log 
hazard function. 

2. The sieve M-theorem on the asymptotic normality of semiparametric 
estimation for bundled parameters. In this section, we extend the general 
theorem introduced by [30], which deals with the asymptotic normality of 
semiparametric M-estimators of regression parameters when the convergence 
rate of the estimator for nuisance parameters can be slower than n~^'^. In 
their theorem, the parameters of interest and the nuisance parameters are 
assumed to be separated. We consider a more general setting where the 
nuisance parameter can be a function of the parameters of interest. The 
theorem is crucial in the proof of asymptotic normality given in Theorem 4.2 
for our proposed estimators. 

Some empirical process notation will be used from now on. We denote 
Pf = f f{z)dP{z) and Pn/ = ra~^ ^"^-^ /(Zj), where P is a probability 
measure, and P„ is an empirical probability measure, and denote Gnf = 
n^'^(P„ — P)/. Given i.i.d. observations Zi,Z2,. ■ ■ ,Zn £ Z, we estimate 
the unknown parameters (/3,(^(-,/3)) by maximizing an objective function 
for(/3,C(-,/3)),n-iEr=i"^(AC(-,/5);^i) = Pnm(/3,C(-,/5);^),where/3isthe 
parameter of interest, and Ci'il^) is the nuisance parameter that can be a 
function of /?. Here "•" denotes the other arguments of C besides /3, which 
can be some components of Z £ Z. If the objective function m is the log- 
likelihood function of a single observation, then the estimator becomes the 
semiparametric maximum likelihood estimator. Here we adopt similar nota- 
tion in [30]. 
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Let 9 = (/3, C(-, /3)), /? G i3 C M*^ and C ^Ti.-, where B is the parameter space 
of /3, and "H is a class of functions mapping from Z xB toM.. Let Q = B xT-L 
be the parameter space of 9. Define a distance between 9i,92G@ by 

^(^1,^2) = {|/32 - /3i|' + ||C2(-,/32) - Ci(-,/3i)f }'/', 

where | • | is the Euchdean distance, and || • || is some norm. Let 0„ be the 
sieve parameter space, a sequence of increasing subsets of the parameter 
space growing dense in G as n — )• oo. We aim to find 9n £ G.„ such that 
d{9n,9o) = Op(l) and /3„ is asymptotically normal. 

For any fixed C(')/9) G 'H, let {Cr]{-,/3) :?7 in a neighborhood of G M} be 
a smooth curve in Ti running through C(")/3) at 77 = 0, that is, Crj{- , P)\r]=o = 
C(-,/3). Assume all C(")/3) G^ are at least twice-differentiable with respect 
to /3, and denote 



h:h{.,P)^^^^^-^^^ 



drj 



ri=0 



Assume the objective function m is twice Frechet differentiable. Since for 
a small 6, we have C(')/5 + S) — C(')/3) = Cpi'^f^)^ + o((5), here C/3(-,/3) = 
d({-,P)/dl3; then by the definition of functional derivatives it follows that 

lim Um{PX{;P + Sy,z)-m{^,C{;Py,z)} 
(5->-0 

= limJ{m(/5,C(-,/?) + C/3(-,/3)<5 + o((^); z) 
5-5-0 

-m(/3,C(-,/3) + C/3(-,/3M;z)} 

+ lim J{m(/3, C(-, /3) + U'^m ^) " "^(/3, C(-, /3); ^)} 
<5-j-0 

= limm2(/3,C(-,/3) + C/3(-,/3)5;z)[o(<^)/<5] 

+ m2(/3,C(-,/3);z)[C/3(-,/3)] 
= m2(/3,C(-,/3);z)[C/3(-,/3)], 

where the subscript 2 indicates that the derivatives are taken with respect 
to the second argument of the function. The last equality holds because 



hm m2(/3, C(-, /3) + C/3(-, /3)5; z) [o{5)/5\ = 0. 
(5->-0 



Similarly we have 



lim^{m2(/3,C(-,/3 + <5);^)[M-,/3)]-m2(/3,C(-,/3);^)[M-,/?)]} 
(5-5>0 

= m22(/3,C(-,/3);^)[/i(-,/3),C/3(-,/5)] 
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and 
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limi{m2(/3,C(-,/3);^)[/i(-,/3 + '^)]-^2(/3,C(-,/3);^)[M-,/?)]} 
5-s>0 



= m2(/3,C(-,/5);^)[/i/3(-,/3)]. 
Thus according to the chain rule of the functional derivatives, we have 

dm{(3,C{;P);z) 



mp{(3X{;P);z) 



m^(/3,C(-,/3);^)W 



m,3/3(/3,C(-,/3);^;) 



5/3 

mi(/3,C(-,/3);^) + m2(/3,C(-,/3);^)[C/3(-,/ 
5m(/5,(C + 77/i)(-,/3);z) 



drj 



r]=0 



m2(/3,C(-,/3);^)[M-,/3)], 

a2m(/3,C(-,^);2)_97n^(/3,C(-,^);^) 



m/3^(/3,C(-,/3);z)[/i] 



dp dp' dp' 

■■mn{P,C{;Py,z) + mi2{P;C{;P);z)m;P)] 
+ m2i(/3,C(-,/3);^)[C/3 (•,/?)] 
+ m22(/5,C(-,/3);^)[C/3(-,/3),C/3(-,/5)] 
+ m2iP,Ci;Py,z)[Cppi;P)], 

^dm(,{P,{C + vh){;py,z) 



77=0 



mc/3(/?,C(-,/3);^)[/i] 



9r? 

■■mu{PX{;Pyz)[h{;P)] 

+ m22(/3,C(-,/3);^)[C/3(-,/3),/i(-,/3)] 

+ m2(/3,C(-,/3);^)[A/3(-,/3)], 

Om2(/3,C(-,/3);^)K-,/3)] 
9/3 



= m2i{PX{;Pyz)[h{;P)] 

+ m22{PX{;Py,z)[h{;P),Cp{;P)] 

+ m2{P,C{;Pyz)[hp{;P)], 
m^^iP,Ci;Pyz)[huh2]=m22{P,a;Pyz)[hii;P),h2{;P)]. 

As noted before, the subscript 1 or 2 in the derivatives indicates that 
the derivatives are taken with respect to the first or the second argument 
of the function, and h inside the square brackets is a function denoting the 
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direction of the functional derivative with respect to C- Note that for the 
second derivatives m/3^ and m(/3, we impUcitly require the direction h to be 
a differentiable function with respect to /?. It is easily seen that when ( is 
free of /3, all the above derivatives reduce to that in [30]. Following [30], we 
also define 

^/3(/3,C(-,/3))=Pm^(/3,C(-,/3);^), 

5^,„(/3,C(-,/3))[/i]=Pnmc(/3,C(-,/5);^)[/i], 

and 

%(/3,C(-,/5))[/i] =5[/3(/5,C(-,/3))[^] =^m^c(/?>C(-,/3);^)[/i]. 
Furthermore, for h = (/ii, /i2, • • • , /id)' G H'^, we denote 

m^{(3,C{;P);z)[h] = {m^{p,C{;^y,z)[hi],...,m^{f],C{;P);z)[hd])', 

m^/3(/3,C(-,/3);^)[h] = (m^^(/3,C(-,/3);^)[/ii],...,mc/3(/3>C(-,/3);^)[M)', 
mcc(AC(-,/3);^)[h,/i]=(mcc(/3,C(-,/3);^)[/ii,/i],..., 

m^^(/3,C(-,/3);z)[/irf,/i]y 
and define correspondingly 

Sc(/3,C(-,/3))[h] = Pmc(/3,C(-,/3);Z)[h], 

4„(/3,C(-,/3))[h]=P„mc(/3,C(-,/3);Z)[h], 

%(/3,C(-,/3))[h]=P%c(/3,C(-,/3);^)[h], 

%(/3,C(-,/3))[h]=Pm<^/5(/3,C(-,/3);^)[h], 

Scc(/3,C(-,/3))[h,/i] = Pmcc(/3'C(-,/3);^)[h,/i]. 

To obtain the asymptotic normality result for the sieve M-estimator /3ji, 
the assumptions we will make in the following look similar to those in [30], 
but all the derivatives with respect to (3 involve the chain rule and hence 
are more complicated, which is the key difference to [30]. Additionally, we 
focus on sieve estimators in the sieve parameter space. We list the following 
assumptions: 
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(Al) (Rate of convergence) For an estimator 0„ = (/3„,C„,(-,/3„)) G 0„ 
and the true parameter ^o = (/3o)Co(')/5o)) G ©) d{0n,0o) = Op{n~^) for some 

e>o. 

(A2) 5^(/3o,Co(-,/3o)) = and 5c(/3o,Co(-,/3o))[/i] = for all /i G M. 
(A3) (Positive information) There exists an h* = (/i^, . . . ,/i^)', where 
/i* G HI for j = 1, . . . , d, such that 

%(/3o,Co(-,/3o))[/i] - S^c(/3o,Co(-,/3o))[h*,/i] =0 

for all /i G H. Furthermore, the matrix 

^ = -^/3/3(/3o,Co(-,/3o)) + %(/3o,Co(-,/3o))[h*] 

= -P{m^^(/3o,Co(-,/3o);^)-m^/3(/3o,Co(-,/3o);^)[h*]} 

is nonsingular. 

(A4) The estimator (/3„,C„(-,/3„)) satisfies 

5/3,n(/3n,Cn(-,/3n)) = Op(n-l/2) ^^^^ 5c,„(/3„, C„(-, /3„)) [h*] = 0^(71-1/2). 

(A5) (Stochastic equicontinuity) For some C > 0, 
sup |\/ra(5';3,„-S'/3)(/3,C(-,/5)) 

d(6»,6»o)<Cn-«,6»ee„ 

- V^{Sp,n - ^/3)(/3o,Co(-,/3o))| = Op(l) 

and 

sup |V^(Sc,„-5c)(/3,C(-,/3))[h*(-,/3)] 

d(6i,eo)<Cn-«,6»ee„ 

-V^(4n-'5c)(/?o,Co(-,/3o))[h*(-,/3o)]|=Op(l). 

(A6) (Smoothness of the model) For some a > 1 satisfying a^ > 1/2, and 
for in a neighborhood of 6q:{6: d{0, 0o) < Cn~^, 6 G e„}, 

|S/3(/^,C(-,/3)) - ^;3(/3o,Co(-,/3o)) - ^^/3(/3o,Co(-,/3o))(/3 - /3o) 

-%(/3o,Co(-,/3o))[C(-,/3)-Co(-,/3o)]| 

= O(d"(0,^o)) 



and 



|5^(/3,C(-,/3))[h*(.,/3)]-5c(/3o,Co(-,/3o))[h*(-,/3o)] 
-%(/3o,Co(-,/3o))[h*(-,/3o)](/3-/3o) 

-%(/3o,Co(-,/3o))[h*(-,/3o),C(-,/3)-Co(-,/3o)]| 
= O((i"(0,0o)). 
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Note that ^ in (Al) depends on the entropy of the sieve parameter space for (" 
and cannot be arbitrarily small; it is controlled by the smoothness of the 
model in (A6). The convergence rate in (Al) needs to be achieved prior to 
obtaining asymptotic normality. Assumption (A2) is a common assumption 
for the maximum likelihood estimation and usually holds. The direction h* 
in (A3) may be found through the equation in (A3). It is the least favorable 
direction when m is the likelihood function. Assumptions (A4) and (A5) 
are usually verified either by the Donsker property or the maximal inequal- 
ity of [28]. Assumption (A6) can be obtained by a Taylor expansion. The 
following theorem is an extension to Theorem 6.1 in [30] when the infinite- 
dimensional parameter ^ is a function of the finite-dimensional parameter /3. 



Theorem 2.1. Suppose that assumptions (Al)-(A6) hold. Then 
V^Wn - /3o) = A"i V^P„m*(/?o, Co(-, /3o); Z) + Op(l) 

where 

m*{(3o,Co{-,Po)]z) =mfs{(3o,Co{-,^oy,z) - "ic(/^o,Co(-,/3o);^)[h*], 
B = P{m*i(3o,Coi;Poy,Z)^^}, 
and A is given in assumption (A3) . Here a®^ = aa' . 

Proof. The proof follows similarly along the proof of Theorem 6.1 
in [30]. Assumptions (Al) and (A5) yield 

Vn{Sp^n - S'/3)(/3n,Cn(-,/3n)) " \/ra(5'/3,„ - 5'/3)(/3o, Co(-, /^o)) =Op{l). 

Since %n(/3n,Cn(-,/3n)) = Op{n~^/^) by (A4) and ^^(/3o,Co(-,/3o)) = by (A2), 
we have 

Similarly, 

V^5c(/3„,Cn(-,/3„))[h*(-,/3„)] + V^%„(/3o,Co(-,/3o))[h*(-,/3o)] = Op(l). 
Combining these equalities and assumption (AG) yields 

^/3/3(/3o, Co(-, /3o))(/3n - /3o) + %(/30, Co(-, /?0)) [Cn(-, /3n) - Co(-, /3o)] 

(2.1) + Sp,nWo,Coi;/3o)) + Oid''ien,eo)) 
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and 

%(/3o,Co(-,/3o))[h*(-,/3o)](/3n-/3o) 

+ %(/3o,Co(-,/5o))[h*(-,/5o),Cn(-,/3n)-Co(-,/3o)] 
(2.2) 

+ 5c,n(/3o,Co(-,/3o))[h*(-,/3o)]+0(d"(^n,^o)) 

Since a > 1 with aS, > 1/2, the rate of convergence assumption (Al) imphes 
VraO(d°((9„,6'o)) = Op(l), then (2.1) and (2.2) together with (A3) yields 

(^/3/3(/3o, Co(-, /3o)) - 5cM/5o, Co(-, /3o)) [h* (•, /5o)])(/3n - /?o) 

= -(^/3,n(/5o,Co(-,/3o))-5c,„(/?o,Co(-,/5o))[h*(-,/3o)]) + Op(n-'/'), 
that is, 

-A0n - /3o) = -Pnm*(/3o, Co(-, /3o); ^) + Op{n-'/^). 
This yields 

V^(/3n - /3o) = ^"' V^Pnm*(/3o, Co(-, /3o); ^) + Op(l) 

3. Back to the linear model: The sieve maximum likelihood estimation. 

By taking logarithm to the positive function A(-) in (1.3), the function g{-) 
in (1.4) is no longer restricted to be positive, which eases the estimation. 
We now describe the spline-based sieve maximum likelihood estimation for 
model (1.1). Under the regularity conditions (C.1)-(C.3) stated in Section 4, 
we know that the observed residual times {Yi — X'-/3 : 13 £ B,i = l, . . . ,n} are 
confined in some finite interval. Let [a, b] be an interval of interest, where 
— oo < a < 6 < oo. Let Tk^ = {ti, . . . , tK„} be a set of partition points of [a, b] 
with Kn = 0{n'^) and maxi<j<x„+i|ij — tj~i\ = 0{n^'^) for some constant 
V G (0,1/2). Let SniTK„,Kn,p) be the space of polynomial splines of order 
p>l defined in [23], Definition 4.1. According to Schumaker ([23], Corol- 
lary 4.10), there exists a set of B-spline basis functions {Bj, 1 < j < qn} with 
Qn = Kn + p such that for any s E Sn(TK„,Kn,p), we can write 

Qn 

(3.1) s{t) = Y,7jBj{t), 

i=i 

where we follow [25] by requiring maxj=i^...^q^]7j] < Cn that is allowed to 
grow with n slowly enough. 

Let 7 = (71, . . . ,7g^)'. Under suitable smoothness assumptions, go{-) = 
logAo(-) can be well approximated by some function in 5„(Tft-„,i^„,p). 
Therefore, we seek a member of 5„ {Tj<c„ , Kn , p) together with a value of 13 £ B 
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that maximizes the log likehhood function. Specifically, let On = (/3n,7n) be 
the value that maximizes 



lnil3,7)=n 



(3.2) 






qn 

E 



-fjB,it-Xlf3)dNi{t) 



j I{Y,>t)exJf2^jB,{t 



Xil3) } dt 



Taking the first order derivatives of /n(/3, 7) with respect to /3 and 7 and set- 
ting them to zero, we can obtain the score equations. Since the integrals here 
are univariate integrals, their numerical implementation can be easily done 
by the one-dimensional Gaussian-quadrature method. A Newton-Raphson 
algorithm or any other gradient-based search algorithm can be applied to 
solve the score equations for all parameters 9 = (/3,7), for example, 



where 6^^^ - 
and 

s{e) 



^^(m)^ 



^Ml 



is the parameter estimate from the Tnth iteration. 



/dln(Ajl\ 

dp 

dlniPn) 



Hie) 



d/3dP' dpdi 

92/„(/3,7) dnn{(3,i) 



are the score function and Hessian matrix of parameter 6. For any fixed j3 
and n, it is clearly seen that IniP,^) in (3.2) is concave with respect to 7 
and goes to —00 if any 7^ approaches either 00 or —00; hence 7„ must be 
bounded which yields an estimator of s in Sn{TK„,Kn,p)- 

As stated in the next section, the distribution of /3„ can be approximated 
by a normal distribution. One way to estimate the variance matrix of /3„ is 
to approximate the (inverse of the) information matrix based on the efficient 
score function for /3o by plugging in the estimated parameters (/3n,A„(-)). 
The consistency of such a variance estimator is given in Theorem 4.3. An- 
other way is to invert the observed information matrix from the last Newton- 
Raphson iteration, taking into account that we are also estimating the nui- 
sance parameter 7. The consistency of the latter approach may be proved 
in a similar way as Example 4 in [24] or via Theorem 2.2 in [9], and we leave 
detailed derivation to interested readers. Simulations indicate that both es- 
timators work reasonably well. 

4. Asymptotic results. Denote e/j = Y — X'/3 and eo = Y — X'/3q. We 
assume the following regularity conditions: 

(C.l) The true parameter /3o belongs to the interior of a compact set 
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(C.2) (a) The covariate X takes values in a bounded subset X C W ; 
(b) E{XX') is nonsingular. 

(C.3) There is a truncation time r < oo such that, for some constant 5, 
P(eo > t\X) > (5 > ahxLost surely with respect to the probability measure 
of X. This implies that Ao(t) < — log 5 < oo. 

(C.4) The error eo's density / and its derivative / are bounded and 

{f{t)/f{t)ff{t)dt<<^. 

(C.5) The conditional density of C given X and its derivative gc\x ^-^^ 
uniformly bounded for all possible values of X, that is, 

s^V 9c\x{'t\X = x) <Ki, sup\gc\x{t\^ = x)\ < K2 

xex x£X 

for all i < r with some constants Ki,K2 > 0, where r is the truncation time 
defined in condition (C.3). 

(C.6) Let G^ denote the collection of bounded functions g on [a, b] with 
bounded derivatives g^^', j = 1, . . . ,k, and the kth derivative g^'^' satisfies 
the following Lipschitz continuity condition: 

\gW{s) - g^''\t)\ < L\s - tr for s,t£ [a,b], 

where /c is a positive integer and m G (0, 1] such that p = k + m >3, and 
L < 00 is an unknown constant. The true log hazard function go{-) = log Ao(-) 
belongs to G^, where [a,b] is a bounded interval. 

(C.7) For some r/ G (0,1), u'Yav{X\eQ)u>r]u'E{XX'\eo)u almost surely 
for all n G M'^. 

Condition (C.l) is a common regularity assumption that has been imposed 
in the literature; see, for example, [17]. Conditions (C.2)(a), (C.3) and (C.4) 
were also assumed in [26]. Condition (C.5) implies Condition B in [26]. 
In condition (C.6), we require p > 3 to provide desirable controls of the 
spline approximation error rates of the first and second derivatives of go (see 
Corollary 6.21 of [23]), which are needed in verifying assumptions (A4)-(A6). 
Condition (C.7) was also proposed for the panel count data model in [30]. 
As noted in their Remark 3.4, this condition (C.7) can be justified in many 
applications when condition (C.2)(b) is satisfied. The bounded interval [a,b] 
in (C.6) may be chosen as a = infy^xiu — x'I3q) > —00 and b = t < 00 un- 
der (C.1)-(C.3), which is what we use in the following. 

Now define the collection of functions 7^^ as follows: 

n^ = {C{;(3):C{t,x,(3)=g{i^{t,x,(3)),g G ^^t G [a,b],xe X,(3e B}, 

where 

V'(i,x,/3)=t-x'(/3-/3o) 

and G^ is defined in (C.6). Here C is a composite function of g composed 
with ^. Note that C{t,x,l3o) = g{t). Then for ((-,/?) G ^^ we define the 
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following norm: 

(4.1) ||C(-,/3)||2 = |/ f {g{t-x'i(3-(3o))}^dAoit)dFx{x)^ 
We also have the following collection of scores: 



h:h{;(3) 



drj 



= u;(V(-,/3)),C,G^^ 

r?=0 



in which h{t,x, P) = w{ip{t,x, 13)) =w{t-x'{l3-(3o))- 

For any Oi = (/3i,Ci(-,/3i)) and 02 = (/32,C2(-,/32)) in the space of G^ = 
B X Ti^, define the following distance: 

(4.2) d{e,,e2) = {|/3i - /32p + ||Ci(-,/3i) - C2(-,/32)||i}'/'. 

Let Qn = SniTK„,Kn,p). Denote 

nP, = {a;l3):C{t,x,(3)=g{i;{t,x,f3)),gegite[a,b],x€X,(3eB} 
and el = Bxn?,. Clearly n?, C nl_^^ C • • • C ?^p for all n > 1. The sieve esti- 
mator 9n = (/3„,Cn(-,/3n)), where Cn(i, a:, ^n) = gn{t - x' 0n - Po)) , isthemax- 
imizer of the empirical log-likelihood n~^ln{9\Z) over the sieve space 0^. 
The following theorem gives the convergence rate of the proposed estima- 
tor 9n to the true parameter ^o = (/^0)Co(')/3o)) = (/3o)5o)- 

Theorem 4.1. Let Kn = 0{vy), where v satisfies the restriction ^^ ■, < 
z/ < ^ with p being the smoothness parameter defined in condition (C.6). 
Suppose conditions (C.1)-(C.7) hold, and the failure time T follows mo- 
del (l-l)- Then 

where d{-,-) is defined in (4-2). 

Remark. It is worth pointing out that the sieve space Qn does not have 
to be restricted to the B-spline space; it can be any sieve space as long 
as the estimator 9n & B x Tin satisfies the conditions of Theorem 1 in [25]. 
We refer to [4] for a comprehensive discussion of the sieve estimation for 
semiparametric models in general sieve spaces. Our choice of the B-spline 
space is primarily motivated by its simplicity of numerical implementation, 
which is a tremendous advantage of the proposed approach over exiting 
numerical methods for the accelerated failure time models, in particular, 
the linear programming approach. 

We provide a proof of Theorem 4.1 in the supplementary material [8] by 
checking the conditions of Theorem 1 in [25]. Theorem 4.1 implies that if 
u = 1/(1 + 2p), d{9n,9o) = Op{n~P'''^~^'^P)) which is the optimal convergence 
rate in the nonparametric regression setting. Although the overall conver- 
gence rate is slower than n~^'^, the next theorem states that the proposed 
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estimator of the regression parameter is still asymptotically normal and 
semiparametrically efficient. 

Theorem 4.2. Given the following efficient score function for the cen- 
sored linear model derived by 



/^„(y,A,x) = y"{x-p(x|y-x'/3o>t)}|-^(t)|dM(t), 

where 

M{t) = AI{Y - X'Po <t)- I I{Y- X'f3o > s)Ao(s) ds 
mnting process mart 
P{X\Y - X'/3o > t) 



is the failure counting process martingale, and 

P{XI{Y - X'l3o > t)} 
P{I{Y - X'po > t)} 
was shown by [21]. Suppose that the conditions in Theorem 4-1 hold, and 



I(f3Q) = p{l* (y,A,X)^2} ^5 nonsingular, then 



i=l 

in distribution. 

The proof of Theorem 4.2 is where we need to apply our general sieve M- 
theorem proposed in Section 2. We prove by checking assumptions (Al)-(A6) . 
Details are provided in Section 7. The following theorem gives consistency 
of the variance estimator based on the above efficient score. 

Theorem 4.3. Suppose the conditions in Theorem 4. 2 hold. Denote 

l*^^ (Y, A, X) = j{X - X{t; /3„)}{-^„(t)} dMit), 

where 

P„{7(y-X'/3„>t)} 

and 

M{t) = AliY - X'Pn <t)- f I{Y- X'/3„ > s) exp{g„(s)} ds. 

J —00 

Then Fn{l% (Y, A, X)^^} -^ P{/* {Y, A, X)®^} = /(/Jg) in probability. 

It is clearly seen that X{t, Pn) in Theorem 4.3 estimates P{X\Y — X'/3q > t) 
in Theorem 4.2. The proof of Theorem 4.3 is provided in the supplementary 
material [81. 
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5. Numerical examples. 

5.1. Simulations. Extensive simulations are carried out to evaluate the 
finite sample performance of the proposed method. In the simulation studies, 
failure times are generated from the model 

logr = 2 + Xi+X2 + eo, 

where Xi is Bernoulli with success probability 0.5, X2 is independent nor- 
mal with mean and standard deviation 0.5 truncated at ±2. This is 
the same model used by [15] and [32]. We consider six error distributions: 
standard normal; standard extreme- value; mixtures of A^(0, 1) and A^(0,3^) 
with mixing probabilities (0.5,0.5) and (0.95,0.05), denoted by 0.5A^(0, 1) -|- 
0.5A^(0, 32 ) and 0.957V(0, 1) -h0.05iV(0, 3^), respectively; Gumbel(-0.5^, 0.5) 
with /i being the Euler constant and 0.5A^(0, 1) -|- 0.5A^(— 1,0.5^). The first 
four distributions were also considered by [32]. Similar to [32], the censoring 
times are generated from uniform [0, c] distribution, where c is chosen to 
produce a 25% censoring rate. We set the sample size n to 200, 400 and 600. 
We choose cubic B-splines with one interior knot for n = 200 and 400, and 
two interior knots for n = 600. We perform the sieve maximum likelihood 
analysis and obtain the estimates of the slope parameters using the Newton- 
Raphson algorithm that updates (/3, 7) iteratively. We stop iteration when 
the change of parameter estimates or the gradient value is less than a pre- 
specified tolerance value that is set to be 10~^ in our simulations. Log-rank 
and Gehan- weighted estimators are included for efficiency comparisons. We 
calculate the theoretical semiparametric efficiency bound I~^(/3o), and scale 
it by the sample size, that is, a* = y^I~^{f3o)/n, which serves as the refer- 
ence standard error under the fully efficient situation. Table 1 summarizes 
the results of these studies based on 1,000 simulated datasets. The bias of 
the proposed estimators of /3i and /32 are negligible. Both variance estima- 
tion procedures, denoted as ^SEE (the standard error estimates by inverting 
the information matrix based on the efficient score function) and ^SEE (the 
standard error estimates by inverting the observed information matrix of 
all parameters including nuisance parameters) , yield nice standard error es- 
timates for the parameter estimators comparing to the empirical standard 
error SE, and the 95% confidence intervals have proper coverage probabili- 
ties, especially when the sample size is large. For the A^(0, 1) error and the 
two mixtures of normal errors that are also considered in [32] , the proposed 
estimators are more efficient than the log-rank estimators and have similar 
variances to the Gehan-weighted estimators. For the standard extreme- value 
error, the proposed estiinators are more efficient than the Gehan-weighted 
estimator and similar to the log-rank estimator that is known to be the 
most efficient estimator under this particular error distribution. For the 
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Table 1 
Summary statistics for the simulation studies. The true slope parameters are /3i = 1 and 

132 = 1. (a); N{0,1); (b); standard extreme-value; (c); 0.5A'(0, 1) +0.hN{0,?,'^); 
(d); 0.95Af(0,l)+0.05Af(0,3^); (e); Gumbel(~0.5^l,0.5); (f); 0.5iV(0, 1) + 0.5Af(-l,0.5^) 



Err. 






B-spline MLE 




Log-rank 


Gehan 




dist n 


Bias 


SE 


^SEE (CP) 


^SEE (CP) 


Bias SE 


Bias SE 


(T* 



(b) 



(a) 200 I3i 

132 

400 /3i 

P2 

600 Pi 

132 

200 Pi 

P2 

400 Pi 

P2 

600 Pi 

P2 

(c ) 200 Pi 

P2 

400 pi 

P2 

600 Pi 

P2 



(d) 



(e) 



(f) 



200 Pi 

P2 

400 pi 

P2 

600 Pi 

P2 

200 Pi 

P2 

400 pi 

P2 

600 /?i 

P2 

200 /?i 
400 pi 

02 

600 /3i 

/?2 



0.003 0.168 0.149 

0.003 0.167 0.153 

0.006 0.110 0.108 

0.001 0.110 0.109 

0.001 0.092 0.088 

0.005 0.091 0.089 

-0.009 0.180 0.154 

0.004 0.182 0.162 

0.000 0.126 0.113 

0.008 0.118 0.116 

0.001 0.102 0.093 

0.011 0.098 0.095 

0.014 0.300 0.281 

0.000 0.306 0.285 

0.034 0.199 0.206 

-0.003 0.207 0.208 

0.035 0.168 0.171 

-0.007 0.169 0.172 

-0.013 0.172 0.157 

-0.004 0.180 0.160 

0.003 0.119 0.113 

0.003 0.117 0.114 

-0.003 0.097 0.093 

0.001 0.096 0.094 

0.004 0.080 0.077 

-0.001 0.083 0.080 

-0.005 0.055 0.055 

0.003 0.055 0.056 

-0.003 0.047 0.045 

-0.001 0.047 0.046 

-0.002 0.126 0.117 

0.000 0.133 0.120 

-0.002 0.087 0.084 

0.004 0.086 0.086 

0.003 0.074 0.070 

0.003 0.074 0.070 



(0.912 


0.155 


0.924) 


(0.928 


0.156 


0.928) 


(0.948 


0.110 


0.950) 


(0.944 


0.110 


0.945) 


(0.939 


0.090 


0.943) 


(0.945 


0.090 


0.944) 


(0.894 


0.161 


0.903) 


(0.903 


0.163 


0.915) 


(0.914 


0.115 


0.923) 


(0.934 


0.116 


0.938) 


(0.919 


0.094 


0.923) 


(0.944 


0.095 


0.945) 


(0.930 


0.279 


0.924) 


(0.916 


0.282 


0.918) 


(0.955 


0.200 


0.949) 


(0.949 


0.202 


0.942) 


(0.957 


0.165 


0.949) 


(0.956 


0.166 


0.956) 


(0.926 


0.164 


0.927) 


(0.908 


0.164 


0.913) 


(0.944 


0.116 


0.948) 


(0.942 


0.116 


0.953) 


(0.948 


0.095 


0.952) 


(0.942 


0.095 


0.944) 


(0.944 


0.078 


0.946) 


(0.929 


0.078 


0.934) 


(0.946 


0.055 


0.951) 


(0.954 


0.056 


0.950) 


(0.940 


0.045 


0.938) 


(0.944 


0.045 


0.943) 


(0.918 


0.120 


0.929) 


(0.917 


0.121 


0.926) 


(0.949 


0.085 


0.950) 


(0.951 


0.086 


0.953) 


(0.929 


0.070 


0.931) 


(0.936 


0.070 


0.936) 



0.000 0.170 
0.004 0.171 
0.005 0.115 
0.002 0.116 
0.001 0.096 
0.005 0.097 

-0.008 0.168 
0.005 0.170 

-0.001 0.124 
0.010 0.116 
0.001 0.100 
0.011 0.097 

-0.020 0.315 
0.002 0.317 
0.002 0.218 

-0.001 0.222 
0.003 0.185 

-0.004 0.190 

-0.010 0.181 

-0.005 0.184 

0.004 0.126 

0.004 0.126 

-0.002 0.105 

0.002 0.105 

-0.001 0.111 
0.000 0.114 

-0.003 0.079 
0.003 0.081 
0.000 0.067 

-0.002 0.066 

-0.002 0.159 
0.002 0.164 
0.003 0.114 
0.003 0.111 
0.005 0.101 
0.009 0.104 



0.002 0.159 0.155 

0.002 0.160 0.156 

0.008 0.108 0.110 

0.001 0.109 0.110 

0.002 0.093 0.090 

0.003 0.092 0.090 

-0.007 0.190 0.165 

0.005 0.195 0.169 

0.000 0.143 0.117 

0.012 0.135 0.120 

0.000 0.114 0.095 

0.007 0.114 0.098 

-0.019 0.292 0.259 

0.002 0.288 0.260 

0.002 0.197 0.183 

-0.002 0.200 0.184 

0.001 0.163 0.150 

-0.002 0.168 0.150 

-0.007 0.166 0.167 

-0.005 0.173 0.166 

0.006 0.117 0.118 

0.003 0.115 0.118 

0.002 0.097 0.096 

0.003 0.094 0.096 

0.004 0.088 0.079 

0.000 0.091 0.080 

-0.004 0.061 0.056 

0.003 0.063 0.056 

-0.001 0.052 0.045 

-0.001 0.051 0.046 

-0.001 0.128 0.119 

0.001 0.134 0.116 

0.000 0.091 0.084 

0.004 0.090 0.082 

0.001 0.074 0.069 

0.004 0.075 0.067 
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Table 2 

Regression parameter estimates and standard error estimates for the Stanford heart 

transplant data. The proposed estimators are compared with Gehan-weighted estimators 

reported in [I4] and Buckley- James estimators reported in [18] 

B-spline MLE Gehan-weighted Buckley-James 



Covariate Est. SE Est. SE Est. SE 

M.l Age -0.0237 0.0068 -0.0211 0.0106 -0.015 0.008 

T5 -0.2118 0.1271 -0.0265 0.1507 -0.003 0.134 

M.2 Age 0.1022 0.0245 0.1046 0.0474 0.107 0.037 

Age^ -0.0016 0.0004 -0.0017 0.0006 -0.0017 0.0005 



Gumbel(-0.5^,0.5) and 0.5iV(0,l) + 0.5N (-1,0.5^) errors, the proposed 
estimators are more efficient than the other two estimators. Under all six 
error distributions, the standard errors of the proposed estimators are close 
to the efficient theoretical standard errors. The sample averages of the es- 
timates for Ao under different simulation settings are reasonably close to 
corresponding true curves (results not shown here; see [7] for details). 

5.2. A real data example. We use the Stanford heart transplant data [18] 
as an illustrative example. This dataset was also analyzed by [15] using their 
proposed least squares estimators. Following their analysis, we consider the 
same two models: the first one regresses the base-10 logarithm of the survival 
time on age at transplant and T5 mismatch score for the 157 patients with 
complete records on T5 measure, and the second one regresses the base-10 
logarithm of the survival time on age and age^. There were 55 censored 
patients. We fit these two models using the proposed method with five cubic 
B-spline basis functions. 

We report the parameter estimates and the standard error estimates in 
Table 2 and compare them with the Gehan-weighted estimators reported 
by [15] and the Buckley-James estimators reported by [18]. For the ffist 
model, the parameter estimates for the age effect are fairly similar among 
all estimators, and the standard error estimate from the proposed method 
tends to be smaller, while the parameter estimates for the T5 mismatch 
score vary across different estimators with none of them being significant at 
the 0.05 level. The disparity of the T5 effect may be due to what was pointed 
out by [18]: the accelerated failure time model with age and T5 as covariates 
does not fit the data ideally. For the second model with age and age^ being 
the covariates, the point estimates are very similar across all methods and 
the standard error estimates from the proposed method are the smallest. 
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6. Discussion. By applying the proposed general sieve M-estimation tlie- 
ory for semiparametric models with bundled parameters, we are able to de- 
rive the asymptotic distribution for the sieve maximum likelihood estimator 
in a linear regression model where the response variable is subject to right 
censoring. By providing a both statistically and computationally efficient 
estimating procedure, this work makes the linear model a more viable alter- 
native to the Cox proportional hazards model. Comparing to the existing 
methods for estimating /3 in a linear model, the proposed method has three 
advantages. First, the estimating functions are smooth functions in contrast 
to the discrete estimating functions in the existing estimation methods; thus 
the root search is easier and can be done quickly by conventional iterative 
methods such as the Newton-Raphson algorithm. Second, the standard error 
estimates are obtained directly by inverting either the efficient information 
matrix for the regression parameters or the observed information matrix of 
all parameters; either method is more computationally tractable compared 
to the re-sampling techniques. Third, the proposed estimator achieves the 
semiparametric efficiency bound. 

The proposed general sieve M-estimation theory can also be applied to 
other statistical models, for example, the single index model, the Cox model 
with an unknown link function and the linear model under different cen- 
soring mechanisms. Such research is undergoing and will be presented else- 
where. 

7. Proof of Theorem 4.2. Empirical process theory developed in [27, 28] 
will be heavily involved in the proof. We use the symbol < to denote that 
the left-hand side is bounded above by a constant times the right-hand side 
and > to denote that the left-hand side is bounded below by a constant 
times the right-hand side. For notational simplicity, we drop the superscript 
* in the outer probability measure P* whenever an outer probability applies. 

7.1. Technical lemmas. We first introduce several lemmas that will be 
used for the proofs of Theorems 4.1, 4.2 and 4.3. Proofs of these lemmas are 
provided in the supplementary material [8]. 

Lemma 7.1. Under conditions (C.1)-(C.3) and (C.6), the log-likelihood 
Z(/3,C(-,/3);Z) = A5(eo-X'(/3-/3o)) 

- [ l(eo>t)exp{<7(t-X'(/3-/3o))}di, 

J a 

where eo = Y — X' Pq, has hounded and continuous first and second deriva- 
tives with respect to 13 £ B and Cl'i/?) £ ^^■ 

Lemma 7.2. For go G Qp, there exists a function (70, n £ Gn such that 

\\9o,n - goWoo = Oin'P"). 
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Lemma 7.3. Let 6'o,„ = (/3o,Co,n(-,/5o)) with Co,n(-,/3o) = go,n defined in 
Lemma 7.2. Denote J-'n = {l{6;z) — l{9Q^n]z):6 G B^}. Assume that condi- 
tions (C.1)~(C.3) and (C.6) hold, then the e-bracketing number associated 
with II • lloo norm for J^n is bounded by {l/eY'^"'^ , that is, Nn{e,J-n, \\ • ||oo) ^ 
{l/eY'^"'^'^ for some constant c > 0. 

Lemma 7.4. Let h*{t,x,P) = w*{ip{t,x,P)), where h* (t , x , f3o) = w* (t) = 
—go{t)P{Xj\eQ>t),j = l,...,d. Assume conditions (C.1)-(C.6) hold, then 
there exists h*.^{t,x,(5) = w*^^{'ip{t,x,(3)) G Tin such that ||/i*,^ — h*\\oo = 
0{n~'^'^), or equivalently, \\w*^ — w*:\\oo = 0{n~'^'^) where w*^^Q'^. 

Lemma 7.5. For h* defined in Lemma 7.4, denote the class of functions 

Hiv) = iki^: z)[h* - hj] -.eeei, hj G nl d{e, do) < v, \\hj - h* lu < ??}. 

Assume conditions (C.1)-(C.6) hold, then N[]{e,Ti{r]), \\ ■ ||oo) < {r]/ey"^'^ 
for some constant c> 0. 

Lemma 7.6. For j = 1,. . . ,d, define the following two classes of func- 
tions: 

<, (^) = ik (^; ^) - k (^o; z)--Oe Qi,d{e, Oo) < r?, 

\\9m;t3))-9om;M)h<v} 
and 

^Ur^) = {i^{e;z)[h*{-,l3)]-i^{9o;z)[h*{-,l3o)]:9€QP,,d{e,eo)<r^}, 

where lp.{9;Z) is the jth element of lp{9;Z), g{-) denotes the derivative 
of g{-) and K is defined in Lemma 7.5. Assume conditions (C.1)-(C.6) 

hold, then Ar[](e, J-f^^.(r?), || • |U) < (77/^)^^^"+'^ and iV[](£, J-^_ .(77), || • lU) < 
{rj / eY^'^'^'^'^ for some constants c\,C2 > 0. 

7.2. Proof of Theorem 4-2. We prove the theorem by checking assump- 
tions (Al)-(A6) in Section 2. Here the criterion function of a single obser- 
vation is the log-likelihood function 1(13, ({■, 13); Z). So instead of m, we use I 
to denote the criterion function. By Theorem 4.1 we know that assumption 
(Al) holds with ^ = uiin^pi', (1 — i^)/2) and the norm || • II2 defined in (4.1). 
Assumption (A2) automatically holds for the scores. For (A3), we need to 
find an h* = (hi, . . . , h*^' with h* (t, x, /3o) = w* (t) such that 

%(/5o,Co(-,/5o))[/i]-%(/3o,Co(-,/3o))[h*,/i] 

= ^{W(/3o,Co(-,/3o); ^)[/i] - Zcc(/5o, Co(-,/3o); ^)[h*, /i]} = 
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for all /iGM with h{t,x,f3) = w{t - x'{j3 - (3o)). Note that 
P{lKil^o,Coi;M;Z)[h]-l^^{f3oXo{;l3o);Z)[h*,h]} 



P< -X 



Aw{eo)- l{eo> t)exp{go{t)}w{t)dt 

J a 



+ / l{eo>t)exp{go{t)}w{t)[Xgo{t)+w*{t)]dt 



Since P{ii;{f3o,Co{-,f3o)]Z)[h]\X} =0 for ah /i G M, replacing /i(-,/3o) by w 
we have 



P< -X 



Aw{eo) - / l(eo > t) e^p{go(,t)}w{t) dt 

J a 



Pl-X-P 



Au;(eo)- / l(eo > t)exp{5o(i)}w'(t) t^t 



X 



= P{-X-0} = 0. 
Hence we only need to find a w* such that 



= I exp{5o(t)}u;(i){5o(t)i^[l(eo>t)^]+w*(t)P[l(eo>t)]}dt = 0. 

J a 

One obvious choice for \^* (or h*) is 

(7.1) h*(t,x,/3o) = w*(t) = -aoit fp^^'^;^^^ = -9o{t)P{X\eo > t). 

Then it follows 

^■/3(/3o,Co(-,/3o);^)-^c(/5o,Co(-,/3o);^)[h1 

= A{-go{Y - X'^o)}{X - P{X\eo >Y- X'/5o)} 

- y l(y - X'Po > t){X - P{X\eo > t)}{-go{t)} exp{5o(t)} dt 

{X - P{X\eo>t)}{-go{t)}dM{t) 

= 1*,^{Y,A,X), 
which is the efficient score function for Pq originally derived by [22], where 

M{t) = AI{Y - X'/3o <t)- j I{Y- X'Po > s) exp{go(s)} ds. 
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By the fact of zero-mean for a score function, it is straightforward to verify 
the following equalities: 

P/;3c(AC(-,/3);^)W = -P{^/3(/3,C(-,/3);^)/c(AC(-,/3);^)W}, 

P/c^(/3,C(-,/3);Z)[/i] = -P{/^(/3,C(-,/3);Z)[/i]/^(/3,C(-,/3);^)}, 

P[^^(/3,C(-,/3);Z) = -P{/^(/3,C(-,/3);Z)/;j(/3,C(-,/3);Z)}, 

P/cc(/3,C(-,/3);^)[^i,/i2] = -P{/c(AC(-,/3);^)[/ii]/c(/3,C(-,/3);^)[M}- 

Then together with the fact that 

^{k(/3o,Co(-,/3o);^)[h*]-/cc(/3o,Co(-,/3o);^)[h*,h*]} = 0, 
the matrix A in assumption (A3) of Theorem 2.1 is given by 
A = P{-/'/3/3(/3o,Co(-,/3o);^) + Zc/3(/5o,Co(-,/3o);^)[h*] 

+ ^'/3c(/3o,Co(-,/3o);^)[h*]-/cc(/3o,Co(-,/3o);^)[h*,h*]} 

-id^o,Co{;f^oy,Z)[h%{^o,Co{;M;Z) 
-hi(3o,Co{;f^oy,Z)i[{(3o,Coi;M;ZW] 
+ /c(/5o,Co(-,/5o);^)[h1/}(/?o,Co(-,/3o);^)[h*]} 
= P{//3(/3o,Co(-,/3o);^)-/c(/5o,Co(-,/3o);^)[h*]}®' 

which is the information matrix for /3o- 

To verify ( A4) , we note that the first part automatically holds since /3„ sat- 
isfies the score equation 5'/3,„(/3„,Cn(-,/5n)) = Pn//3(/3„,C„(-,^„); Z) = 0. Next 
we shall show that 

where w*{t) = —gQ{t)P{Xj\eo >t),j = l,...,d,is the jth component of vif*(t) 
given in (7.1). According to Lemma 7.4, there exists h*j^ G T-L^ such that 
WK — W„||oo = 0{n~^'^). Then by the score equation for 7: 5'-y^„(/3„,7„) = 
^n^-y{f3n,ln',Z) = and the fact that w*:^{t) can be written as w*^{t) = 
Yl'k=il*j k^k{i) f°^ some coefficients {tTd • • • )7j*g„} ^nd the basis func- 
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tions -Bfc(t) of the spline space, it follows that 

rjAwl^iY-X'^n)- [l{Y>t)eMUt,Xjn)}wl^{t-X'^n)dt\=0. 

So it suffices to show that for each 1 < j < d, 

In=rJ^0n,U-Jn);Z)[h* - h^ = Op{n~'/^). 

Since P{/c(/3o, Co(-, /5o); Z) [h* - /i* „]} = 0, we decompose /„ into /„ = /i„ + hn, 
where 

hn = (Pn - P)k{Pn,U-Jn);Z)[h* - /l*„] 

and 

hn = P{k{fin,U;k);Z)[h* - /i* J - /^(/3o,Co(-,/3o);^)[/i* - /i-,J}- 

We will show that /i„ and l2n are both Op(n~^'^). 

First consider /i„. According to Lemma 7.5, the e-bracketing number 
associated with || • ||oo norm for the class J~n{'n) defined in Lemma 7.5 is 
bounded by {rj / eY'^"'^'^ . This implies that 

logiV[](e, J-^(7?),L2(P)) <logiVjj(e, J-i(r?), II . lU) < g„log(r?/e), 

which leads to the bracketing integral 

J[](7?, J-^(7?),L2(P)) = y ^l + logiV[](e,J-ii(7?),L2(P))de 

Now we pick i] to be ??„ = 0{?i-'^*'^(2!.,(i-!.)/2)|^ ^^^^ 

\\h* - /i* Jloo = 0{n-^n < 0{n— i'^(2.,(i-.)/2)| ^ ^^^ 
and since p > 3, 

Therefore, ic0n,U-Jn); z)[h* - h*J G H{Vn)- Denote t/3 = i - ^'(/3 - M 
for notational simplicity, for any l(^{9;Z)[hj — h] £ J^niVn), it follows that 

P{i^{9;Z)[h*-h]}^ 

= pIa{w* - w){ep) + / l(eo > t) exp{g{ti3)}{w* - w){tp) dt 

< \\w* - w\\l, + pIJ exp{2g{t^)}{w* - wf{tp) dt\ 

< \\w* - w\\l, + \\w* - w\\l, / P[exp{25(i^)}] dt, 

J a 
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where the first inequahty holds because of the Cauchy-Schwarz inequal- 
ity. Since Ww* — w||oo < Vn, by the same argument as ([25], page 591), 
for slowly growing c„ (their In), for example, c„ = o{log{r]~^)) , we know 
that \\l,^{9; Z)[h* — /i]||oo is bounded by some constant < M < oo and 
P{l(^{6;Z)[h* — h]}'^ < rjn for a slightly enlarged r]n obtained by a fine ad- 
justment of z>. Then by the maximal inequality in Lemma 3.4.2 of [28], it 
follows that 

where the last equality holds because < u < 1/2. Thus by Markov's in- 
equality, I^ = n-^/^Gni<:0n; Z)[h* - /i*„] = Op{n~^l^). 

Next for l2n-, the Taylor expansion for l(^{9ni Z)[h* — h*-] at 6o yields 

ic{LU-Jny,Z)[h* - hU - /c(/3o,Co(-,/3o);^)[/i* - /i*„] 

= (/3n - M'hd^nXni-Jn); Z)[h* - h^ 
+ k(0n,U-Jn);Z)[h*-hlnXn-Co], 

where (/3„,Cn(-,/3n)) is between (/3o,Co(-,/3o)) and (^„,C„(-,/3„)). Then it fol- 
lows that 

\hc0n,U-Jn);Z)[h*-hln]\ 
= ^{^K-^l,n)(%) 

l(eo > t) exp{gn{t^J}[{w* - wlJitpJ 

< \\wj - wlJoo + \\w* - wlJool / exp{5„(t^^J} dt 
+ \\Wj-wlJool / exp{gn{t^J}g^{t^Jdt[ 

< \\Wj - W*„||oo + \\W* - wlJoo 
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where the second inequahty holds because gn and its first derivative g^ are 
bounded (or growing with n slowly enough so it can be effectively treated 
as bounded based on the same argument of [25] on page 591), and the 
last equality holds due to the Corollary 6.21 of [23] that Hw^! — w'tnlloo = 
0(n-(2-i)i') = o(„-'^). Thus, 

P\0n-f3oyiK0nXn{-Jn);Z)[h*-hlJ 

= |/3n-/3o|-0(n'") 
Also, 

\kd^n,U-Jn);Z)[h* - hl^Xn " Co]| 

l(eo > t) exp{gn{t^J}{w* - w*J{t^J{gn - go)it^J dt 



- W'^j-^lnWoo 



/ exp{9n{t^J}{gn - 90){t^J dt \ 



1 1 * * 1 1 T 

By the Cauchy-Schwarz inequality and the boundedness of gn, we have 

r-6 ^ 2 



P{hn? = P[l ^M~9n{t^J}i9n " 9o){t^Jdt 



<Jj i9n-9or{t^)dAo{t)dFxix) = \\U;l3n)-Co{;MM 

< |/3n - /3n|' + ||Cn(-,/3n) " Co(-,/3o)||i + |/3o " M' 

< l/3n - /3o|' + WU-Jn) - Co(-,/3o)||i = d{en, O^f . 

Hence P|/3n| <d{9n,9o) and 

P\i^d^n,9n;Z)[h*-hl^,Cn-Co]\ 
= {n-'^™((P+2)'^.(l+3'')/2)}. 

Since ^^^ < '^ < TT2^, it follows that hn = 0{n-™"«P+i)'''(i+3'^)/2)} = 

o(n~^/^). Thus /„ = hn + hn = Op{n~^/'^), and condition (A4) holds. 

Now we verify assumption (A5). First by Lemma 7.6, the e-bracketing 

numbers for the classes of functions J'^Av) ^'^d F^ Aif) are both bounded 

by (?//e)^''""*''^, which implies that the corresponding e-bracketing integrals 
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1/2 

are both bounded by qn rj, that is, 

Then for lp^{6;z) — Ip^{Oq;z), by applying the Cauchy-Schwarz inequahty, 
together with subtracting and adding the terms ^(eo), e^^^^'^'ilitp), e3°^^'g{tj3) 
and e^'^^^'goitjs), we have 

{i^^{e;Z)-ip^{eo;Z)}^ 

+ Xj ^ l(eo > t)[e^(*^)5(t/3) - e^^^^olt)] dt\ 
< {A[giep) - goieo)?} + | f\e<^^'^^g{tp) - e^oi^^ goit)f dt 



< {A[gie^) - gieo)]'} + {A[gieo) - goieo)]'} 



b 



+ / {[e^^'P'^ - e'^^^'^^f + [e'^°^'P^ - e'^°^'^f}g\tp)dt 



b 



+ / e2^»W{[<7(t;3)-<7o(t/3)]' + e25oW[5o(t/3)-5o(t)]'}dt 



= i?i + -B2 + -B3 + i?4. 

For Bi, since (^ is bounded and the largest eigenvalue of P{XX') satisfies 
< Ad < 00 by condition (C.2)(b), it follows that 

PB, < P[g{Y - X'p)X\l3 - /3o)]2 < P[X'(/3 - (3o)f 

<A,|/3-/3op<|/3-/3o|'<7?^ 

For i?2, we have 

PB2< j U {g{t)-9o{t)fdKQ{t)\dFx{x) 

= ||5(^(-,/3o))-5o(^(-,/3o))||2 
<l/3-/3o|' + ||5(^(-,/3))-5o(V'(-,/3o))||i<r?'. 
For i?3, by using the mean value theorem, it follows that 

PB^ = pi f\[e3('^\g - go){tp)f + [e''^'^^X'il3 - (3o)f}g\tp) dt 
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< I I {g - gof{ti3)dAo{t)dFxix) + P[X'{f3 - (3o)? 

JX Ja 

<llC(•,/3)-Co(•,/3o)||i + |/3-/3oP<r?^ 

where g = go ~^ ^{s ~ So) for some < .^ < 1 and thus is bounded. Finally 
for i?4, by the mean value theorem, it follows that 

PB, = p|^ e2soW{[5(i^) - g,(tp)f + e2soW[5o(i^) _ g,(t)f} dt 

< f f {g-go)\t^)dAo{t)dFx{x)+P f [go{t^)X'{P - f3o)]^ dt 

J X J a J a 

< \\g{H;(3)) - goW;m\l + P[X'{(3 - M? 
<\\gm;P))-go{H;m\\l + \/3-Po\'<v'- 

Therefore we have P{lp.{9\Z) — lp{9Q\Z)}'^ < rf. Using the similar argu- 
ment, we can show that P{/c(6'; Z)[h*] - l(^{eo] Z)[h*]Y < r/^. By Lemma 7.1, 
we also have ||//3^.(6';Z) - i/3j.(6'o; ^)||oo and \\i^{9; Z)[h*] - it^{9o; Z)[h*]\\oo are 
both bounded. Now we pick rj asrin = 0{n-™^^((P~^)'''(^-'')/2)|^ ^i^^^ ^y ^j^g 
maximal inequality in Lemma 3.4.2 of [28], it follows that 

= 0{n"iax((3/2-p)^,^-l/2)| ^ OK-^/2) ^ p(i)^ 

where the last equality holds since p > 3 and u < ■^. Similarly, we have 
£^p||Gn||x-c f N = o(l). Thus for .^ = minfpz^, (1 — z^)/2) and 

by Markov's inequality, 

sup G„{//3^,(/3,C(-,/3);^)-//3,(/3o,Co(-,/3o);^)} = Op(l), 

d(e,9o)<Cn-i 

sup Gn{ki(3,Ci;f^y,Z)[h*]-icil3o,Co{;M;Z)[h*]} = Op{l). 

d(e,ea)<Cn-i 

This completes the verification of assumption (A5). 

Finally, assumption (A6) can be verified by using the Taylor expansion. 
Since the proofs for the two equations in (A6) are essentially identical, we 
just prove the first equation. In a neighborhood of ^o : {^ : d{9, 9o) < Cn~^, 9 G 
0^} with ^ = min(pz^, (1 — i')/2), the Taylor expansion for /^(0; Z) yields 

ip{9- Z) = ip{9o; Z) + lpp{9; Z)(/5 - /3o) + kg{h ^)[C(-, /3) - Co(-, /5o)] 

= /^(0o;^) + W(^o;^)(/3-/3o) + Z/3c(^o;^)[C(-,/3)-Co(-,/3o)] 
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+ {ipp{e; Z)(/5 - /3o) - kpiOo; Z)(/3 - /3o)} 
+ {[^c(^~;^)[C(-,/3)-Co(-,/3o)]-?/3c(^o;^)[C(-,/3)-Co(-,/3o)]}, 
where 9 = {f3,(^i',P)) is a midpoint between ^o and 9. So 
P{/;3(^; Z) - /^(^o; ^) - W(^o; ^)(/3 - /3o) - ?'/3c(^o; ^)[C(-,/3) - Co(-,/3o)]} 

+ P{l^^{9;Z)[C{;P)-Co{;M]-hd^o;Z)[C{;P)-Co{;(3o)]}. 
Then by direct calculation we have 
P\ifif,{9;Z)-li,^{9o;Z)\ 

<P\XX'A{^{e^)-goieo)}\ 



pIxx' 



+ 



b 

l(eo > i){exp{5(f^)}5(t^) - exp{go{t)}go{t)} dt 

fb 

/ l(eo >t){exp{5(t^)}g2(t^) -exp{(7o(t)}5o(i)}^* 

J a 



<P|A{5(e^)-5o(eo)}| 



r\^ 



+ P{J \ exp{5(i^)}5'(i^) - exp{5o(t)}5o(*)l c^^ 

= Ci + C2 + Cs. 

By applying a similar argument that we used before for verifying (A5) and 
condition (C.6), we can show 

Ci<\^-M + \\9m;^))-9om;M)h 

Similarly, we can show 

C2<\^-M + \\9m;(3))-9om;M)h 

and 

C-s < 1/3 -/3o| + II^CV'l-,/?)) - <?o(V'(-,/3o))||2 
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where ^ = min(pzv, (1 — i^)/2). Therefore, 



and thus 



= 0{n~ min(2(p-l)!^,l/2+(P"5/2)i.,l-i.) | 

p+l — 2' 2 



where the last equality holds since p>3, so 2{p — \)v > 2— j- > i 1 + (p . 



|)i/ > 2 and 1 — u > 2- Similarly we can show 

P|[;3c(6*;^)[C(-,/3)-Co(-,/3o)]-k(^o;^)[C(-,/3)-Co(-,/3o)]| 

= 0{n~ min(2(p-l)'^,l/2+(p-5/2)i.,l-'^) | 

= o(n-i/2). 
Therefore, we have 
\P{if,{e;Z) - U9o;Z) - lpp{eo;Z){!3 - /3o) - /^c(^~;^)[C(-,/3) - Co(-,/3o)]}| 

= 0{n-™'i(2(p-l)i.,l/2+(p-5/2)i.,l-<^)} = 0(n~''^), 

where a = min(2(p— \)u, \ + {p— |)i^, 1 — i^)/min(pi/, i^) > 1 and a^ > 1/2. 
Therefore, we have verified all six assumptions, and thus we have 

V^0n- Pg) = A-^^¥nll,{PoXo{-,Po);Z) + Op{l) ^ N{Q,A-^B{A-^)'), 

where 1% (6o;Z) = 1(s{0q; Z) — 1(^{6q; Z)[h*] is the efficient score function for /3o 
and A = P{l*p^{Y,A,X)}'^'^ = /(/3o), which is shown when verifying (A3). 
Hence A = B and A-^B{A-^y = A^^ = /"H/So), and 

n 

Thus we complete the proof of Theorem 4.2. 
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SUPPLEMENTARY MATERIAL 

Additional proofs (DOL 10.1214/11-AOS934SUPP; .pdf). The supple- 
mentary document contains proofs of technical lemmas and Theorems 4.1 
and 4.3. 
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